76 research outputs found

    APEX2S: A Two-Layer Machine Learning Model for Discovery of host-pathogen protein-protein Interactions on Cloud-based Multiomics Data

    Get PDF
    Presented by the avalanche of biological interactions data, computational biology is now facing greater challenges on big data analysis and solicits more studies to mine and integrate cloud-based multiomics data, especially when the data are related to infectious diseases. Meanwhile, machine learning techniques have recently succeeded in different computational biology tasks. In this article, we have calibrated the focus for host-pathogen protein-protein interactions study, aiming to apply the machine learning techniques for learning the interactions data and making predictions. A comprehensive and practical workflow to harness different cloud-based multiomics data is discussed. In particular, a novel two-layer machine learning model, namely APEX2S, is proposed for discovery of the protein-protein interactions data. The results show that our model can better learn and predict from the accumulated host-pathogen protein-protein interactions

    Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language Models

    Full text link
    The recent performance leap of Large Language Models (LLMs) opens up new opportunities across numerous industrial applications and domains. However, erroneous generations, such as false predictions, misinformation, and hallucination made by LLMs, have also raised severe concerns for the trustworthiness of LLMs', especially in safety-, security- and reliability-sensitive scenarios, potentially hindering real-world adoptions. While uncertainty estimation has shown its potential for interpreting the prediction risks made by general machine learning (ML) models, little is known about whether and to what extent it can help explore an LLM's capabilities and counteract its undesired behavior. To bridge the gap, in this paper, we initiate an exploratory study on the risk assessment of LLMs from the lens of uncertainty. In particular, we experiment with twelve uncertainty estimation methods and four LLMs on four prominent natural language processing (NLP) tasks to investigate to what extent uncertainty estimation techniques could help characterize the prediction risks of LLMs. Our findings validate the effectiveness of uncertainty estimation for revealing LLMs' uncertain/non-factual predictions. In addition to general NLP tasks, we extensively conduct experiments with four LLMs for code generation on two datasets. We find that uncertainty estimation can potentially uncover buggy programs generated by LLMs. Insights from our study shed light on future design and development for reliable LLMs, facilitating further research toward enhancing the trustworthiness of LLMs.Comment: 20 pages, 4 figure

    Study on the Applicable Conditions for Protective Left-Turn Phase and Permissive Left-Turn Phase

    Get PDF
    In order to improve the operation efficiency of intersections and make traffic management more scientific, this paper conducts a research on the application conditions for protective left-turn phase and permissive left-turn phase. Taking the traffic efficiency model as a constraint and VISSIM simulation as research means, this paper makes a comparative analysis of the traffic efficiency under different flow conditions using different control means, so as to obtain the specific traffic flow conditions applicable to different control means. This research aims to provide data support for the scientific application of traffic management

    Integration of natural and deep artificial cognitive models in medical images: BERT-based NER and relation extraction for electronic medical records

    Get PDF
    IntroductionMedical images and signals are important data sources in the medical field, and they contain key information such as patients' physiology, pathology, and genetics. However, due to the complexity and diversity of medical images and signals, resulting in difficulties in medical knowledge acquisition and decision support.MethodsIn order to solve this problem, this paper proposes an end-to-end framework based on BERT for NER and RE tasks in electronic medical records. Our framework first integrates NER and RE tasks into a unified model, adopting an end-to-end processing manner, which removes the limitation and error propagation of multiple independent steps in traditional methods. Second, by pre-training and fine-tuning the BERT model on large-scale electronic medical record data, we enable the model to obtain rich semantic representation capabilities that adapt to the needs of medical fields and tasks. Finally, through multi-task learning, we enable the model to make full use of the correlation and complementarity between NER and RE tasks, and improve the generalization ability and effect of the model on different data sets.Results and discussionWe conduct experimental evaluation on four electronic medical record datasets, and the model significantly out performs other methods on different datasets in the NER task. In the RE task, the EMLB model also achieved advantages on different data sets, especially in the multi-task learning mode, its performance has been significantly improved, and the ETE and MTL modules performed well in terms of comprehensive precision and recall. Our research provides an innovative solution for medical image and signal data

    Health Monitoring for Coated Steel Belts in an Elevator System

    Get PDF
    This paper presents a method of health monitoring for coated steel belts in an elevator system by measuring the electrical resistance of the ropes embedded in the belt. A model on resistance change caused by fretting wear and stress fatigue has been established. Temperature and reciprocating cycles are also taken into consideration when determining the potential strength degradation of the belts. It is proved by experiments that the method could effectively estimate the health degradation of the most dangerous section as well as other ones along the whole belts

    Rapid detection of grass carp reovirus type 1 using RPA-based test strips combined with CRISPR Cas13a system

    Get PDF
    IntroductionDue to the existence of grass carp reovirus (GCRV), grass carp hemorrhagic disease occurs frequently, and its high pathogenicity and infectivity are great challenges to the aquaculture industry. As a highly pathogenic pathogen, the outbreak of hemorrhagic disease often causes tremendous economic losses. Therefore, it is important to rapidly and accurately detect GCRV on site to control timely.MethodsIn this study, recombinant enzyme amplification (RPA) combined with clustered regularly interspaced short palindromic repeats (CRISPR)/Cas13a system was employed to establish a method to detect the vp7 gene of grass carp reovirus type 1. This method can be adopted for judging the results by collecting fluorescence signal, ultraviolet excitation visual fluorescence and test strip.ResultsCombined with the RPA amplification experiment, the detection limit of the RPA-CRISPR method can reach 7.2 × 101 copies/μL of vp7 gene per reaction, and the detection process can be completed within 1 h. In addition, this method had no cross-reaction with the other 11 common aquatic pathogens. Then, the performance of the RPA-CRISPR/Cas13a detection method was evaluated by comparing it with the real-time fluorescence quantitative PCR detection method of clinical samples. The results of RPA-CRISPR/Cas13a detection were shown to be in consistence with the results obtained from the real-time fluorescence quantitative PCR detection. The coincidence rate of this method with 26 GCRV clinical samples was 92.31%.DiscussionIn summary, this method has high sensitivity, specificity and on-site practicability for detecting GCRV type 1, and has great application potential in on-site GCRV monitoring

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe
    corecore